3

New Things in Python

One of the most important steps in the history of Python was probably the release of Python 3.0. The most notable changes that happened in that release were:

  • Resolving multiple issues regarding text, data, and Unicode handling
  • Getting rid of old-style classes
  • Starting standard library reorganizations
  • Introducing function annotations
  • Introducing new syntax for exception handling

As we know from Chapter 1, Current Status of Python, Python 3 isn't backward-incompatible with Python 2. This is the main reason why it took so many years for the Python community to fully embrace it. That was a tough, albeit necessary, lesson for Python core developers and the Python community.

Fortunately, problems associated with the adoption of Python 3 didn't stop the process of language evolution. Since December 3, 2008 (the official release of Python 3.0), we've seen a stable inflow of new major Python updates. Every new release brought new improvements to the language, its standard library, and its interpreter. Moreover, beginning with version 3.9, Python has adopted an annual release cycle. This means we will have access to new features and improvements every year.

If you want to learn more about the Python release cycle, read the PEP 602—Annual Release Cycle for Python document, available at https://www.python.org/dev/peps/pep-0602/.

In this chapter, we will take a closer look at the recent Python evolution. We will review a number of important additions across the latest few releases. We will also take a speculative look into the future and present a few features that have been accepted in the PEP process and will become an official part of the Python programming language in the very near future. Along the way, we'll cover the following topics:

  • Recent language additions
  • Not that new, but still shiny
  • What may come in the future?

But before we review those features, let's begin by considering the technical requirements.

Technical requirements

The following are the Python packages that are mentioned in this chapter that you can download from PyPI:

  • mypy
  • pyright

Information on how to install packages is included in Chapter 2, Modern Python Development Environments.

The code files for this chapter can be found at https://github.com/PacktPublishing/Expert-Python-Programming-Fourth-Edition/tree/main/Chapter%203.

Recent language additions

Every release of Python comes with it a lot of changes of different types. Almost every release of Python brings some new syntax elements. However, the majority of the changes are related to Python's standard library, the CPython interpreter, the Python API, and CPython's C API. Due to space limitations, it is impossible to cover all of these in this book. That is why we will focus just on new syntax features and new additions to the standard library.

In terms of the two latest versions of Python, we can distinguish four main syntax updates:

  • Dictionary and merge update operators (added in Python 3.9)
  • Assignment expressions (added in Python 3.8)
  • Type hinting generics (added in Python 3.9)
  • Positional-only arguments (added in Python 3.8)

These four features would best be described as quality-of-life improvements. They do not introduce any new programming paradigms, nor drastically change the way your code can be written. They simply allow for better coding patterns or enable stricter API definition.

In recent years, Python core developers have been primarily focused on removing dead or redundant modules from the standard library rather than adding anything new. Still, from time to time, we see some standard library additions. In the last two releases, we have been the beneficiaries of two completely new modules:

  • The zoneinfo module for supporting the IANA (Internet Assigned Numbers Authority) time zone database (added in Python 3.9)
  • The graphlib module for operating with graph-like structures (added in Python 3.8)

Both modules are fairly small with regards to their API size. Later, we will discuss some example areas where you could apply them. But first, let's zoom into the syntax updates incorporated in Python 3.8 and Python 3.9.

Dictionary merge and update operators

Python allows the use of a number of selected arithmetic operators to manipulate the built-in container types, including lists, tuples, sets, and dictionaries.

For lists and tuples, you can use the + (addition) operator to concatenate two variables as long as they are the same type. There is also the += operator, which allows for the in-place modification of existing variables. The following transcript presents examples of the concatenation of lists and tuples in an interactive session:

>>> [1, 2, 3] + [4, 5, 6]
[1, 2, 3, 4, 5, 6]
>>> (1, 2, 3) + (4, 5, 6)
(1, 2, 3, 4, 5, 6)
>>> value = [1, 2, 3]
>>> value += [4, 5, 6]
>>> value
[1, 2, 3, 4, 5, 6]
>>> value = (1, 2, 3)
>>> value += (4, 5, 6)
>>> value
(1, 2, 3, 4, 5, 6)

When it comes to sets, there are exactly four binary operators (having two operands) that produce a new set:

  • Intersection operator: Represented by & (bitwise OR). This produces a set with elements common to both sets.
  • Union operator: Represented by | (bitwise OR). This produces a set of all elements in both sets.
  • Difference operator: Represented by - (subtraction). This produces a set with elements in the left-hand set that are not in the right-hand set.
  • Symmetric difference: Represented by ^ (bitwise XOR). This produces a set with elements of both sets that are in either of the sets but not both.

The following transcript presents examples of intersection and union operations on sets in an interactive session:

>>> {1, 2, 3} & {1, 4}
{1}
>>> {1, 2, 3} | {1, 4}
{1, 2, 3, 4}
>>> {1, 2, 3} - {1, 4}
{2, 3}
>>> {1, 2, 3} ^ {1, 4}
{2, 3, 4}

For a very long time, Python didn't have a dedicated binary operator that would permit the production of a new dictionary from two existing dictionaries. Starting with Python 3.9, we can use the | (bitwise OR) and |= (in-place bitwise OR) operators to perform a dictionary merge and update operations on dictionaries. That should be the idiomatic way of producing a union of two dictionaries. The reasoning behind adding new operators was outlined in the PEP 584—Add Union Operators To Dict document.

A programming idiom is the common and most preferable way of performing specific tasks in a given programming language. Writing idiomatic code is an important part of Python culture. The Zen of Python says: "There should be one—and preferably only one—obvious way to do it."

We will discuss more idioms in Chapter 4, Python in Comparison with Other Languages.

In order to merge two dictionaries into a new dictionary, use the following expression:

dictionary_1 | dictionary_2

The resulting dictionary will be a completely new object that will have all the keys of both source dictionaries. If both dictionaries have overlapping keys, the resulting object will receive values from the rightmost object.

Following is an example of using this syntax on two dictionary literals, where the dictionary on the left is updated with values from the dictionary on the right:

>>> {'a': 1} | {'a': 3, 'b': 2}
{'a': 3, 'b': 2}

If you prefer to update the dictionary variable with the keys coming from a different dictionary, you can use the following in-place operator:

existing_dictionary |= other_dictionary

The following is an example of usage with a real variable:

>>> mydict = {'a': 1}
>>> mydict |= {'a': 3, 'b': 2}
>>> mydict
{'a': 3, 'b': 2}  

In older versions of Python, the simplest way to update an existing dictionary with the contents of another dictionary was to use the update() method, as in the following example:

existing_dictionary.update(other_dictionary)

This method modifies existing_dictionary in place and returns no value. This means that it does not allow the straightforward production of a merged dictionary as an expression and is always used as a statement.

The difference between expressions and statements will be explained in the Assignment expressions section.

Alternative – Dictionary unpacking

It is a little-known fact that Python already supported a fairly concise way to merge two dictionaries before version 3.9 through a feature known as dictionary unpacking. Support for dictionary unpacking in dict literals was introduced in Python 3.5 with PEP 448 Additional Unpacking Generalizations. The syntax for unpacking two (or more) dictionaries into a new object is as follows:

{**dictionary_1, **dictionary_2}

The example involving real literals is as follows:

>>> a = {'a': 1}; b = {'a':3, 'b': 2}
>>> {**a, **b}
{'a': 3, 'b': 2}

This feature, together with list unpacking (with *value syntax), may be familiar for those who have experience of writing functions that can accept an undefined set of arguments and keyword arguments, also known as variadic functions. This is especially useful when writing decorators.

We will discuss the topic of variadic functions and decorators in detail in Chapter 4, Python in Comparison with Other Languages.

You should remember that dictionary unpacking, while extremely popular in function definitions, is an especially rare method of merging dictionaries. It may confuse less experienced programmers who are reading your code. That is why you should prefer the new merge operator over dictionary unpacking in code that runs in Python 3.9 and newer versions. For older versions of Python, it is sometimes better to use a temporary dictionary and a simple update() method.

Alternative – ChainMap from the collections module

Yet another way to create an object that is, functionally speaking, a merge of two dictionaries is through the ChainMap class from the collections module. This is a wrapper class that takes multiple mapping objects (dictionaries in this instance) and acts as if it was a single mapping object.

The syntax for merging two dictionaries with ChainMap is as follows:

new_map = ChainMap(dictionary_2, dictionary_1)

Note that the order of dictionaries is reversed compared to the | operator. This means that if you try to access a specific key of the new_map object, it will perform lookups over wrapped objects in a left-to-right order. Consider the following transcript, which illustrates examples of operations using the ChainMap class:

>>> from collections import ChainMap
>>> user_account = {"iban": "GB71BARC20031885581746", "type": "account"}
>>> user_profile = {"display_name": "John Doe", "type": "profile"}
>>> user = ChainMap(user_account, user_profile)
>>> user["iban"]
'GB71BARC20031885581746'
>>> user["display_name"]
'John Doe'
>>> user["type"]
'account'

In the preceding example, we can clearly see that the resulting user object of the ChainMap type contains keys from both the user_account and user_profile dictionaries. If any of the keys overlap, the ChainMap instance will return the value of the leftmost mapping that has the specific key. That is the complete opposite of the dictionary merge operator.

ChainMap is a wrapper object. This means that it doesn't copy the contents of the source mappings provided, but stores them as a reference. This means that if underlying objects change, ChainMap will be able to return modified data. Consider the following continuation of the previous interactive session:

>>> user["display_name"]
'John Doe'
>>> user_profile["display_name"] = "Abraham Lincoln"
>>> user["display_name"]
'Abraham Lincoln'

Moreover, ChainMap is writable and populates changes back to the underlying mapping. What you need to remember is that writes, updates, and deletes only affect the leftmost mapping. If used without proper care, this can lead to some confusing situations, as in the following continuation of the previous session:

>>> user["display_name"] = "John Doe"
>>> user["age"] = 33
>>> user["type"] = "extension"
>>> user_profile
{'display_name': 'Abraham Lincoln', 'type': 'profile'}
>>> user_account
{'iban': 'GB71BARC20031885581746', 'type': 'extension', 'display_name': 'John Doe', 'age': 33}

In the preceding example, we can see that the'display_name' key was populated back to the user_account dictionary, where user_profile was the initial source dictionary holding such a key. In many contexts, such backpropagating behavior of ChainMap is undesirable. That's why the common idiom for using it for the purpose of merging two dictionaries actually involves explicit conversion to a new dictionary. The following is an example that uses previously defined input dictionaries:

>>> dict(ChainMap(user_account, user_profile))
{'display_name': 'John Doe', 'type': 'account', 'iban': 'GB71BARC20031885581746'}

If you want to simply merge two dictionaries, you should prefer a new merge operator over ChainMap. However, this doesn't mean that ChainMap is completely useless. If the back and forth propagation of changes is your desired behavior, ChainMap will be the class to use. Also, ChainMap works with any mapping type. So, if you need to provide unified access to multiple objects that act as if they were dictionaries, ChainMap will enable the provision of a single merge-like unit to do so.

If you have a custom dict-like class, you can always extend it with the special __or__() method to provide compatibility with the | operator instead of using ChainMap. Overriding special methods will be covered in Chapter 4, Python in Comparison with Other Languages. Anyway, using ChainMap is usually easier than writing a custom __or__() method and will allow you to work with pre-existing object instances of classes that you cannot modify.

Usually, the most important reason for using ChainMap over dictionary unpacking or the union operator is backward compatibility. On Python versions older than 3.9, you won't be able to use the new dictionary merge operator syntax. So, if you have to write code for older versions of Python, use ChainMap. If you don't, it is better to use the merge operator.

Another syntax change that has a big impact on backward compatibility is assignment expressions.

Assignment expressions

Assignment expressions are a fairly interesting feature because their introduction affected the fundamental part of Python syntax: the distinction between expressions and statements. Expressions and statements are the key building blocks of almost every programming language. The difference between them is really simple: expressions have a value, while statements do not.

Think of statements as consecutive actions or instructions that your program executes. So, value assignments, if clauses, together with for and while loops, are all statements. Function and class definitions are statements, too.

Think of expressions as anything that can be put into an if clause. Typical examples of expressions are literals, values returned by operators (excluding in-place operators), and comprehensions, such as list, dictionary, and set comprehensions. Function calls and method calls are expressions, too.

There are some elements of the many programming languages that are often inseparably bound to statements. These are often:

  • Functions and class definitions
  • Loops
  • if...else clauses
  • Variable assignments

Python was able to break that barrier by providing syntax features that were expression counterparts of such language elements, namely:

  • Lambda expressions for anonymous functions as a counterpart for function definitions:
    lambda x: x**2
    
  • Type object instantiation as a counterpart for class definition:
    type("MyClass", (), {})
    
  • Various comprehensions as a counterpart for loops:
    squares_of_2 = [x**2 for x in range(10)]
    
  • Compound expressions as a counterpart for if … else statements:
    "odd" if number % 2 else "even"
    

For many years, however, we haven't had access to syntax that would convey the semantics of assigning a value to a variable in the form of an expression, and this was undoubtedly a conscious design choice on the part of Python's creators. In languages such as C, where variable assignment can be used both as an expression and as a statement, this often leads to situations where the assignment operator is confused by the equality comparison. Anyone who has programmed in C can attest to the fact that this is a really annoying source of errors. Consider the following example of C code:

    int err = 0;
    if (err = 1) {
        printf("Error occured");
    }

And compare it with the following:

    int err = 0;
    if (err == 1) {
        printf("Error occured");
    }

Both are syntactically valid in C because err = 1 is an expression in C that will evaluate to the value 1. Compare this with Python, where the following code will result in a syntax error:

err = 0
if err = 1:
     printf("Error occured")

On rare occasions, however, it may be really handy to have a variable assignment operation that would evaluate to a value. Luckily, Python 3.8 introduced the dedicated := operator, which assigns a value to the variable but acts as an expression instead of a statement. Due to its visual appearance, it was quickly nicknamed the walrus operator.

The use cases for this operator are, quite frankly, limited. They help to make code more concise. And often, more concise code is easier to understand because it improves the signal-to-noise ratio. The most common scenario for the walrus operator is when a complex value needs to be evaluated and then immediately used in the statements that follow.

A commonly referenced example is working with regular expressions. Let's imagine a simple application that reads source code written in Python and scans it with regular expressions looking for imported modules.

Without the use of assignment expressions, the code could appear as follows:

import os
import re
import sys
import_re = re.compile(
    r"^\s*import\s+\.{0,2}((\w+\.)*(\w+))\s*$"
)
import_from_re = re.compile(
    r"^\s*from\s+\.{0,2}((\w+\.)*(\w+))\s+import\s+(\w+|\*)+\s*$"
)
if __name__ == "__main__":
    if len(sys.argv) != 2:
        print(f"usage: {os.path.basename(__file__)} file-name")
        sys.exit(1)
    with open(sys.argv[1]) as file:
        for line in file:
            match = import_re.search(line)
            if match:
                print(match.groups()[0])
            match = import_from_re.search(line)
            if match:
                print(match.groups()[0])

As you can observe, we had to repeat twice the pattern that evaluates the match of complex expressions and then retrieves grouped tokens. That block of code could be rewritten with assignment expressions in the following way:

if match := import_re.match(line):
    print(match.groups()[0])
if match := import_from_re.match(line):
    print(match.groups()[0])

As you can see, there is a small improvement in terms of readability, but it isn't dramatic. This type of change really shines in situations where you need to repeat the same pattern multiple times. The continuous assignment of temporary results to the same variable can make code look unnecessarily bloated.

Another use case could be reusing the same data in multiple places in larger expressions. Consider the example of a dictionary literal that represents some predefined data of an imaginary user:

first_name = "John"
last_name = "Doe"
height = 168
weight = 70
user = {
    "first_name": first_name,
    "last_name": last_name,
    "display_name": f"{first_name} {last_name}",
    "height":  height,
    "weight": weight,
    "bmi": weight / (height / 100) ** 2,
}

Let's assume that in our situation, it is important to keep all the elements consistent. Hence, the display name should always consist of a first name and a last name, and the BMI should be calculated on the basis of weight and height. In order to prevent us from making a mistake when editing specific data components, we had to define them as separate variables. These are no longer required once a dictionary has been created. Assignment expressions enable the preceding dictionary to be written in a more concise way:

user = {
    "first_name": (first_name := "John"),
    "last_name": (last_name := "Doe"),
    "display_name": f"{first_name} {last_name}",
    "height": (height := 168),
    "weight": (weight := 70),
    "bmi": weight / (height / 100) ** 2,
}

As you can see, we had to wrap assignment expressions with parentheses. Unfortunately, the := syntax clashes with the : character used as an association operator in dictionary literals and parentheses are a way around that.

Assignment expressions are a tool for polishing your code and nothing more. Always make sure that once applied, they actually improve readability, instead of making it more obscure.

Type-hinting generics

Type-hinting annotations, although completely optional, are an increasingly popular feature of Python. They allow you to annotate variable, argument, and function return types with type definitions. These type annotations serve documentational purposes, but can also be used to validate your code using external tools. Many programming IDEs are able to understand typing annotations and visually highlight potential typing problems. There are also static type checkers, such as mypy or pyright, that can be used to scan through the whole code base and report all typing errors of code units that use annotations.

The story of the mypy project is very interesting. It began life as the Ph.D. research of Jukka Lehtosalo, but it really started to take shape when he started working on it together with Guido van Rossum (Python creator) at Dropbox. You can learn more about that story from the farewell letter to Guido on Dropbox's tech blog at https://blog.dropbox.com/topics/company/thank-you--guido.

In its simplest form, type hinting can be used with a conjunction of the built-in or custom types to specify desired types, function input arguments, and return values, as well as local variables. Consider the following function, which allows the performance of the case-insensitive lookup of keys in a string-keyed dictionary:

from typing import Any
def get_ci(d: dict, key: str) -> Any:
    for k, v in d.items():
        if key.lower() == k.lower():
            return v

The preceding example is, of course, a naïve implementation of a case-sensitive lookup. If you would like to do this in a more performant way, you would probably require a dedicated class. We will eventually revisit this problem later in the book.

The first statement imports from the typing module the Any type, which defines that the variable or argument can be of any type. The signature of our function specifies that the first argument, d, should be a dictionary, while the second argument, key, should be a string. The signature ends with the specification of a return value, which can be of any type.

If you're using type checking tools, the preceding annotations will be sufficient to detect many mistakes. If, for instance, a caller switches the order of positional arguments, you will be able to detect the error quickly, as the key and d arguments are annotated with different types. However, these tools will not complain in a situation where a user passes a dictionary that uses different types for keys.

For that very reason, generic types such as tuple, list, dict, set, frozenset, and many more can be further annotated with types of their content. For a dictionary, the annotation has the following form:

 dict[KeyType, ValueType]

The signature of the get_ci() function, with more restrictive type annotations, would be as follows:

def get_ci(d: dict[str, Any], key: str) -> Any: ...

In older versions of Python, built-in collection types could not be annotated so easily with types of their content. The typing module provides special types that can be used for that purpose. These types include:

  • typing.Dict for dictionaries
  • typing.List for lists
  • typing.Tuple for tuples
  • typing.Set for sets
  • typing.FrozenSet for frozen sets

These types are still useful if you need to provide functionality for a wide spectrum of Python versions, but if you're writing code for Python 3.9 and newer releases only, you should use the built-in generics instead. Importing those types from typing modules is deprecated and they will be removed from Python in the future.

We will take a closer look at typing annotations in Chapter 4, Python in Comparison with Other Languages.

Positional-only parameters

Python is quite flexible when it comes to passing arguments to functions. There are two ways in which function arguments can be provided to functions:

  • As a positional argument
  • As a keyword argument

For many functions, it is the choice of the caller in terms of how arguments are passed. This is a good thing because the user of the function can decide that a specific usage is more readable or convenient in a given situation. Consider the following example of a function that concatenates the strings using a delimiter:

def concatenate(first: str, second: str, delim: str):
    return delim.join([first, second])

There are multiple ways in terms of how this function can be called:

  • With positional arguments: concatenate("John", "Doe", " ")
  • With keyword arguments: concatenate(first="John", second="Doe", delim=" ")
  • With a mix of positional and keyword arguments: concatenate("John", "Doe", delim=" ")

If you are writing a reusable library, you may already know how your library is intended to be used. Sometimes, you may know from your experience that specific usage patterns will make the resulting code more readable, or quite the opposite. You may not be certain about your design yet and want to make sure that the API of your library may be changed within a reasonable time frame without affecting your users. In either case, it is a good practice to create function signatures in a way that supports the intended usage and also allows for future extension.

Once you publish your library, the function signature forms a usage contract with your library. Any change to the argument names and their ordering can break applications of the programmer using that library.

If you were to realize at some point in time that the argument names first and second don't properly explain their purpose, you cannot change them without breaking backward compatibility. That's because there may be a programmer who used the following call:

concatenate(first="John", second="Doe", delim=" ")

If you want to convert the function into a form that accepts any number of strings, you can't do that without breaking backward compatibility because there might be a programmer who used the following call:

concatenate("John", "Doe", " ")

Fortunately, Python 3.8 added the option to define specific arguments as positional-only. This way, you may denote which arguments cannot be passed as keyword arguments in order to avoid issues with backward compatibility in the future. You can also denote specific arguments as keyword-only. Careful consideration as to which arguments should be passed as position-only and which as keyword-only serves to make the definition of functions more susceptible to future changes. Our concatenate() function, defined with the use of positional-only and keyword-only arguments, could look as follows:

def concatenate(first: str, second: str, /, *, delim: str):
    return delim.join([first, second])

The way in which you read this definition is as follows:

  • All arguments preceding the / mark are positional-only arguments
  • All arguments following the * mark are keyword-only arguments

The preceding definition ensures that the only valid call to the concatenate() function would be in the following form:

concatenate("John", "Doe", delim=" ")

And if you were to try to call it differently, you would receive a TypeError error, as in the following example:

>>> concatenate("John", "Doe", " ")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: concatenate() takes 2 positional arguments but 3 were given

Let's assume that we've published our function in a library in the last format and now we want to make it accept an unlimited number of positional arguments. As there is only one way in which this function can be used, we can now use argument unpacking to implement the following change:

def concatenate(*items, delim: str):
    return delim.join(items)

The *items argument will capture all the positional arguments in the items tuple. Thanks to such changes, users will be able to use the function with a variable number of positional items, as in the following examples:

>>> concatenate("John", "Doe", delim=" ")
'John Doe'
>>> concatenate("Ronald", "Reuel", "Tolkien", delim=" ")
'Ronald Reuel Tolkien'
>>> concatenate("Jay", delim=" ")
'Jay'
>>> concatenate(delim=" ")
''

Positional-only and keyword-only arguments are a great tool for library creators as they create some space for future design changes that won't affect their users. But they are also a great tool for writing applications, especially if you work with other programmers. You can utilize positional-only and keyword-only arguments to make sure that functions will be invoked as intended. This may help in future code refactoring.

zoneinfo module

Handling time and time zones is one of the most challenging aspects of programming. The main reasons are numerous common misconceptions that programmers have about time and time zones. Another reason is the never-ending stream of updates to actual time zone definitions. And these changes happen every year, often for political reasons.

Python, starting from version 3.9, makes access to the information regarding current and historical time zones easier than ever. The Python standard library provides a zoneinfo module that is an interface to the time zone database either provided by your operating system or obtained as a first-party tzdata package from PyPI.

Packages from PyPI are considered third-party packages, while standard library modules are considered first-party packages. tzdata is quite unique because it is maintained by CPython's core developers. The reason for extracting the contents of the IANA database to separate packages on PyPI is to ensure regular updates that are independent from CPython's release cadence.

Actual usage involves creating ZoneInfo objects using the following constructor call:

ZoneInfo(timezone_key)

Here, timezone_key is a filename from IANA's time zone database. These filenames resemble the way in which time zones are often presented in various applications. Examples include:

  • Europe/Warsaw
  • Asia/Tel_Aviv
  • America/Fort_Nelson
  • GMT-0

Instances of the ZoneInfo class can be used as a tzinfo parameter of the datetime object constructor, as in the following example:

from datetime import datetime
from zoneinfo import ZoneInfo
dt = datetime(2020, 11, 28, tzinfo=ZoneInfo("Europe/Warsaw"))

This allows you to create so-called time zone-aware datetime objects. Time zone-aware datetime objects are essential in properly calculating the time differences in specific time zones because they are able to take into account things such as changes between standard and daylight-saving time, together with any historical changes made to IANA's time zone database.

You can obtain a full list of all the time zones available in your system using the zoneinfo.available_timezones() function.

graphlib module

Another interesting addition to the Python standard library is the graphlib module, added in Python 3.9. This is a module that provides utilities for working with graph-like data structures.

A graph is a data structure consisting of nodes connected by edges. Graphs are a concept from the field of mathematics known as graph theory. Depending on the edge type, we can distinguish between two main types of graphs:

  • An undirected graph is a graph where every edge is undirected. If a graph was a system of cities connected by roads, the edges in an undirected graph would be two-way roads that can be traversed from either side. So, if two nodes, A and B, are connected to edge E in an undirected graph, you can traverse from A to B and from B to A using the same edge, E.
  • A directed graph is a graph where every edge is directed. Again, if a graph was a system of cities connected by roads, the edges in a directed graph would be a single-way road that can be traversed from a single point of origin only. If two nodes, A and B, are connected to a single edge, E, that starts from node A, you can traverse from A to B using that edge, but can't traverse from B to A.

Moreover, graphs can be either cyclic or acyclic. A cyclic graph is a graph that has at least one cycle—a closed path that starts and ends at the same node. An acyclic graph is a graph that does not have any cycles. Figure 3.1 presents example representations of directed and undirected graphs:

Obraz 1

Figure 3.1: Visual representations of various graph types

Graph theory deals with many mathematical problems that can be modeled using graph structures. In programming, graphs are used to solve many algorithmic problems. In computer science, graphs can be used to represent the flow of data or relationships between objects. This has many practical applications, including:

  • Modeling dependency trees
  • Representing knowledge in a machine-readable format
  • Visualizing information
  • Modeling transportation systems

The graphlib module is supposed to aid Python programmers when working with graphs. This is a new module, so it currently only includes a single utility class named TopologicalSorter. As the name suggests, this class is able to perform a topological sort of directed acyclic graphs.

Topological sorting is the operation of ordering nodes of a Directed Acyclic Graph (DAG) in a specific way. The result of topological sorting is a list of all nodes where every node appears before all the nodes that you can traverse to from that node, in other words:

  • The first node will be the node that cannot be traversed to from any other node
  • Every next node will be a node from which you cannot traverse to previous nodes
  • The last node will be a node from which you cannot traverse to any node

Some graphs may have multiple orderings that satisfy the requirements of topological sorting. Figure 3.2 presents an example DAG with three possible topological orderings:

Obraz zawierający kwadrat

Opis wygenerowany automatycznie

Figure 3.2: Various ways to sort the same graph topologically

To better understand the use of topological sorting, let's consider the following problem. We have a complex operation to execute that consists of multiple dependent tasks. This job could be, for instance, migrating multiple database tables between two different database systems. This is a well-known problem, and there are already multiple tools that can migrate data between various database management systems. But for the sake of illustration, let's assume that we don't have such a system and need to build something from scratch.

In relational database systems, rows in tables are often cross-referenced, and the integrity of those references is guarded by foreign key constraints. If we would like to ensure that, at any given point in time, the target database is referentially integral, we would have to migrate our all the tables in specific order. Let's assume we have the following database tables:

  • A customers table, which holds personal information pertaining to customers.
  • An accounts table, which holds information about user accounts, including their balances. A single user can have multiple accounts (for instance, personal and business accounts), and the same account cannot be accessed by multiple users.
  • A products table, which holds information on the products available for sale in our system.
  • An orders table, which holds individual orders of multiple products within a single account made by a single user.
  • An order_products table, which holds information regarding the quantities of individual products within a single order.

Python does not have any special data type dedicated to represent graphs. But it has a dictionary type that is great at mapping relationships between keys and values. Let's define references between our imaginary tables:

table_references = {
    "customers": set(),
    "accounts": {"customers"},
    "products": set(),
    "orders": {"accounts", "customers"},
    "order_products": {"orders", "products"},
}

If our reference graph does not have cycles, we can topologically sort it. The result of that sorting would be a possible table migration order. The constructor of the graphlib.TopologicalSorter class accepts as input a single dictionary in which keys are origin nodes and values are sets of destination nodes. This means that we can pass our table_references variable directly to the TopologicalSorter() constructor. To perform a topological sort, we can use the static_order() call, as in the following transcript from an interactive session:

>>> from graphlib import TopologicalSorter
>>> table_references = {
...     "customers": set(),
...     "accounts": {"customers"},
...     "products": set(),
...     "orders": {"accounts", "customers"},
...     "order_products": {"orders", "products"},
... }
>>> sorter = TopologicalSorter(table_references)
>>> list(sorter.static_order())
['customers', 'products', 'accounts', 'orders', 'order_products']

Topological sorting can be performed only on DAGs. TopologicalSorter doesn't check for the existence of cycles during initialization, although it will detect cycles during sorting. If a cycle is found, the static_order() method will raise a graphlib.CycleError exception.

Our example was, of course, straightforward and fairly easy to solve by hand. However, real databases often consist of dozens or even hundreds of tables. Preparing such a plan manually for databases that big would be a very tedious and error-prone task.

The features we've reviewed so far are quite new, so it will take some time until they become the mainstream elements of Python. That's because they are not backward compatible, and older versions of Python are still supported by many library maintainers.

In the next section, we will review a number of important Python elements introduced in Python 3.6 and Python 3.7, so we will definitely have wider Python version coverage. Not all of these new elements are popular though, so I hope you will still learn something.

Not that new, but still shiny

Every Python release brings something new. Some changes are real revelations; they greatly improve the way we can program and are adopted almost instantly by the community. The benefits of other changes, however, may not be obvious at the beginning and they may require a little more time to really take off.

We've seen this happening with function annotations that were part of Python from the very first 3.0 release. It took years to build an ecosystem of tools that would leverage them. Now, annotations seem almost ubiquitous in modern Python applications.

The core Python developers are very conservative about adding new modules to the standard library and we rarely see new additions. Still, chances are that you will soon forget about using the graphlib or zoneinfo modules if you don't have the opportunity to work with problems that require manipulating graph-like data structures or the careful handling of time zones. You may have already forgotten about other nice additions to Python that have happened over the past few years. That's why we will do a brief review of a few important changes that happened in versions older than Python 3.7. These will either be small but interesting additions that could easily be missed, or things that simply take time to get used to.

breakpoint() function

We discussed the topic of debuggers in Chapter 2, Modern Python Development Environments. The breakpoint() function was already mentioned there as an idiomatic way of invoking the Python debugger.

It was added in Python 3.7, so has already been available for quite some time. Still, it is one of those changes that simply takes some effort to get used to. We've been told and taught for many years that the simplest way to invoke the debugger from Python code is via the following snippet:

import pdb; pdb.set_trace()

It doesn't look pretty, nor does it look straightforward but, if you've been doing that every day for years, as many programmers have, you would have that in your muscle memory. Problem? Jump to the code, input a few keystrokes to invoke pdb, and then restart the program. Now you're in the interpreter shell at the very same spot as your error occurs. Done? Go back to the code, remove import pdb; pdb.set_trace(), and then start working on your fix.

So why should you bother? Isn't that something of a personal preference? Are breakpoints something that ever get to production code?

The truth is that debugging is often a solitary and deeply personal task. We often spend numerous hours struggling with bugs, looking for clues, and reading code over and over in a desperate attempt to locate that small mistake that is breaking our application. When you're deeply focused on finding the cause of a problem, you should definitely use something that you find the most convenient. Some programmers prefer debuggers integrated into IDEs. Some programmers don't even use debuggers, preferring elaborated print() calls spread all over the code instead. Always choose whatever you find the most convenient.

But if you're used to a plain old shell-based debugger, the breakpoint() can make your work easier. The main advantage of this function is that it isn't bound to a single debugger. By default, it invokes a pdb session, but this behavior can be modified with a PYTHONBREAKPOINT environment variable. If you prefer to use an alternative debugger (such as ipdb, as mentioned Chapter 2, Modern Python Development Environments), you can set this environment variable to a value that will tell Python which function to invoke.

Standard practice is to set your preferred debugger in a shell profile script so that you don't have to modify this variable in every shell session. For instance, if you're a Bash user and want to always use ipdb instead of pdb, you could insert the following statement in your .bash_profile file:

PYTHONBREAKPOINT=ipdb.set_trace()

This approach also works well when working together. For instance, if someone asks for your help with debugging, you can ask them to insert breakpoint statements in suspicious places. That way, when you run the code on your own computer, you will be using the debugger of your choice.

If you don't know where to put your breakpoint, but the application exits upon an unhandled exception, you can use the postmortem feature of pdb. With the following command, you can start your Python script in a debugging session that will pause at the moment the exception was raised:

python3 -m pdb -c continue script.py

Development mode

Since version 3.7, the Python interpreter can be invoked in dedicated development mode, which introduces additional runtime checks. These are helpful in diagnosing potential issues that may arise when running the code. In correctly working code, those checks would be unnecessarily expensive, so they are disabled by default.

Development mode can be enabled in two ways:

  • Using the -X dev command-line option of the Python interpreter, for instance:
    python -X dev my_application.py
    
  • Using the PYTHONDEVMODE environment variable, for instance:
    PYTHONDEVMODE=1 my_application
    

The most important effects that this mode enables are as follows:

  • Memory allocation hooks: buffer under/overflow, violations of the memory allocator API, unsafe usage of the Global Interpreter Lock (GIL)
  • Import warnings issued in relation to possible mistakes when importing modules
  • Resource warnings issued in the case of improper handling of resources, for instance, not closing opened files
  • Deprecation warnings regarding elements of the standard library that have been deprecated and will be removed in future releases
  • Enabling a fault handler that outputs an application stack trace when the application receives SIGSEGV, SIGFPE, SIGABRT, SIGBUS, or SIGILL system signals

Warnings emitted in development mode are indications that something does not work the way it should. They may be useful in finding problems that are not necessarily manifested as errors during the normal operation of your code, but may lead to tangible defects in the long term.

The improper cleanup of opened files may lead at some point to resource exhaustion of the environment your application is running in. File descriptors are resources, the same as RAM or disk storage. Every operating system has a limited number of files that can be opened at the same time. If your application is opening new files without closing them, at some point, it won't be able to open new ones.

Development mode enables you to identify such problems in advance. This is why it is advised to use this mode during application testing. Due to the additional overhead of checks enabled by development mode, it is not recommended to use this in production environments.

Sometimes, development mode can be used to diagnose existing problems, too. An example of really problematic situations is when your application experiences a segmentation fault.

When this happens in Python, you usually won't get any details of the error, except the very brief message printed on your shell's standard output:

Segmentation fault: 11

When a segmentation fault occurs, the Python process receives a SIGSEGV system signal and terminates instantly. On some operating systems, you will receive a core dump, which is a snapshot of the process memory state recorded at the time of the crash. This can be used to debug your application. Unfortunately, in the case of CPython, this will be a memory snapshot of the interpreter process, so debugging will be taking place at the level of C code.

Development mode installs additional fault handler code that will output the Python stack trace whenever it receives a fault signal. Thanks to this, you will have a bit more information about which part of the code could lead to the problem. The following is an example of known code that will lead to a segmentation fault in Python 3.9:

import sys
sys.setrecursionlimit(1 << 30)
def crasher():
    return crasher()
crasher()

If you execute this in Python interpreter with the -X dev flag, you will get output similar to the following:

Fatal Python error: Segmentation fault
Current thread 0x000000010b04edc0 (most recent call first):
  File "/Users/user/dev/crashers/crasher.py", line 6 in crasher
  File "/Users/user/dev/crashers/crasher.py", line 6 in crasher
  File "/Users/user/dev/crashers/crasher.py", line 6 in crasher
  File "/Users/user/dev/crashers/crasher.py", line 6 in crasher
  File "/Users/user/dev/crashers/crasher.py", line 6 in crasher
  ...

This fault handler can also be enabled outside of development mode. To do that, you can use the -X faulthandler command-line option or set the PYTHONFAULTHANDLER environment variable to 1.

It's not easy to cause segmentation faults in Python. This often happens for some Python extensions written in C or C++ or functions called from shared libraries (such as DLLs, .dylib, or .so objects). Still, there are some known and well documented conditions where this problem can occur in pure Python code. The repository of the CPython interpreter includes a collection of such known "crashers." This can be found under https://github.com/python/cpython/tree/master/Lib/test/crashers.

Module-level __getattr__() and __dir__() functions

Every Python class can define the custom __getattr__() and __dir__() methods to customize the dynamic attribute access of objects. The __getattr__() function is invoked when a given attribute name is not found to capture a missing attribute lookup and possibly generate a value on the fly. The __dir__() method is called when an object is passed to the dir() function and it should return a list of object attribute names.

Starting from Python 3.7, the __getattr__() and __dir__() functions can be defined at module level. The semantics are similar to object methods. The __getattr__() module-level function, if defined, will be called on a failed module member lookup. The __dir__() function will be called when a module object is passed to the dir() function.

This feature may be useful for library maintainers when deprecating module functions or classes. Let's imagine that we exposed our get_ci() function from the Type-hinting generics section in an open source library called dict_helpers.py. If we would like to rename the function to lookup_ci() and still be allowed to import it under the old name, we could use the following deprecation pattern:

from typing import Any
from warnings import warn
def ci_lookup(d: dict[str, Any], key: str) -> Any:
    ...
def __getattr__(name: str):
    if name == "get_ci":
        warn(f"{name} is deprecated", DeprecationWarning)
        return ci_lookup
    raise AttributeError(f"module {__name__} has no attribute {name}")

The preceding pattern will emit DeprecationWarning, regardless of whether the get_ci() function is imported directly from a module (such as via from dict_helpers import get_ci) or accessed as a dict_helpers.get_ci attribute.

Deprecation warnings are not visible by default. You can enable them in development mode.

Formatting strings with f-strings

F-strings, also known as formatted string literals, are one of the most beloved Python features that came with Python 3.6. Introduced with PEP 498, they added a new way of formatting strings. Prior to Python 3.6, we already had two different string formatting methods. So right now, there are three different ways in which a single string can be formatted:

  • Using % formatting: This is the oldest method and uses a substitution pattern that resembles the syntax of the printf() function from the C standard library:
    >>> import math
    >>> "approximate value of π: %f" % math.pi
    'approximate value of π: 3.141593'
    
  • Using the str.format() method: This method is more convenient and less error-prone than % formatting, although it is more verbose. It enables the use of named substitution tokens as well as reusing the same value many times:
    >>> import math
    >>> " approximate value of π: {:f}".format(pi=math.pi)
    'approximate value of π: 3.141593'
    
  • Using formatted string literals (so called f-strings). This is the most concise, flexible, and convenient option for formatting strings. It automatically substitutes values in literals using variables and expressions from local namespaces:
    >>> import math
    >>> f"approximate value of π: {math.pi:f}"
    'approximate value of π: 3.141593'
    

Formatted string literals are denoted with the f prefix, and their syntax is closest to the str.format() method, as they use a similar markup for denoting replacement fields in formatted text. In the str.format() method, the text substitutions refer to positional and keyword arguments. What makes f-strings special is that replacement fields can be any Python expression, and it will be evaluated at runtime. Inside strings, you have access to any variable that is available in the same namespace as the formatted literal.

The ability to use expressions as replacement fields makes formatting code simpler and shorter. You can also use the same formatting specifiers of replacement fields (for padding, aligning, signs, and so on) as the str.format() method, and the syntax is as follows:

f"{replacement_field_expression:format_specifier}"

The following is a simple example of code executed in an interactive session that prints the first ten powers of the number 10 using f-strings and aligns the results using string formatting with padding:

>>> for x in range(10):
...     print(f"10^{x} == {10**x:10d}")
... 
10^0 ==          1
10^1 ==         10
10^2 ==        100
10^3 ==       1000
10^4 ==      10000
10^5 ==     100000
10^6 ==    1000000
10^7 ==   10000000
10^8 ==  100000000
10^9 == 1000000000

The full formatting specification of the Python string forms a separate mini language inside Python. The best reference source for this is the official documentation, which you can find under https://docs.python.org/3/library/string.html. Another useful internet resource regarding this topic is https://pyformat.info/, which presents the most important elements of this specification using practical examples.

Underscores in numeric literals

Underscores in numeric literals are probably one such feature that are the easiest to adopt, but still not as popular as they could be. Starting from Python 3.6, you can use the _ (underscore) character to separate digits in numeric literals. This facilitates the increased readability of big numbers. Consider the following value assignment:

account_balance = 100000000

With so many zeros, it is hard to tell immediately whether we are dealing with millions or billions. You can instead use an underscore to separate thousands, millions, billions, and so on:

account_balance = 100_000_000

Now, it is easier to tell immediately that account_balance equals one hundred million without carefully counting the zeros.

secrets module

One of the prevalent security mistakes perpetrated by many programmers is assuming randomness from the random module. The nature of random numbers generated by the random module is sufficient for statistical purposes. It uses the Mersenne Twister pseudorandom number generator. It has a known uniform distribution and a long enough period length that it can be used in simulations, modeling, or numerical integration.

However, Mersenne Twister is a completely deterministic algorithm, as is the random module. This means that as a result of knowing its initial conditions (the seed number), you can generate the same pseudorandom numbers. Moreover, by knowing enough consecutive results of a pseudorandom generator, it is usually possible to retrieve the seed number and predict the next results. This is true for Mersenne Twister as well.

If you want to see how random numbers from Mersenne Twister can be predicted, you can review the following project on GitHub: https://github.com/kmyk/mersenne-twister-predictor.

That characteristic of pseudorandom number generators means that they should never be used for generating random values in a security context. For instance, if you need to generate a random secret that would be a user password or token, you should use a different source of randomness.

The secrets module serves exactly that purpose. It relies on the best source of randomness that a given operating system provides. So, on Unix and Unix-like systems, that would be the /dev/urandom device, and on Windows, it will be the CryptGenRandom generator.

The three most important functions are:

  • secrets.token_bytes(nbytes=None): This returns nbytes of random bytes. This function is used internally by secrets.token_hex() and secrets.token_urlsafe(). If nbytes is not specified, it will return a default number of bytes, which is documented as "reasonable."
  • secrets.token_hex(nbytes=None): This returns nbytes of random bytes in the form of a hex-encoded string (not a bytes() object). As it takes two hexadecimal digits to encode one byte, the resulting string will consist of nbytes × 2 characters. If nbytes is not specified, it will return the same default number of bytes as secrets.token_bytes().
  • secrets.token_urlsafe(nbytes=None): This returns nbytes of random bytes in the form of a URL-safe, base64-encoded string. As a single byte takes approximately 1.3 characters in base64 encoding, the resulting string will consist of nbytes × 1.3 characters. If nbytes is not specified, it will return the same default number of bytes as secrets.token_bytes().

Another important, but often overlooked, function is secrets.compare_digest(a, b). This compares two strings or byte-like objects in a way that does not allow an attacker to guess if they at least partially match by measuring how long it took to compare them. A comparison of two secrets using ordinary string comparison (the == operator) is susceptible to a so-called timing attack. In such a scenario, the attacker can try to execute multiple secret verifications and, by performing statistical analysis, gradually guess consecutive characters of the original value.

What may come in the future?

At the time of writing this book, Python 3.9 is still only a few months old, but the chances are that when you're reading this book, Python 3.10 has either already been released or is right around the corner.

As the Python development processes are open and transparent, we have constant insight into what has been accepted in the PEP documents and what has already been implemented in alpha and beta releases. This allows us to review selected features that will be introduced in Python 3.10. The following is a brief review of the most important changes that we can expect in the near future.

Union types with the | operator

Python 3.10 will bring yet another syntax simplification for the purpose of type hinting. Thanks to this new syntax, it will be easier to construct union-type annotations.

Python is dynamically typed and lacks polymorphism. As a result of this, functions can easily accept the same argument, which can be a different type depending on the call, and properly process it if those types have the same interface. To better understand this, let's bring back the signature of a function that allowed case-insensitive loopback of string-keyed dictionary values:

def get_ci(d: dict[str, Any], key: str) -> Any: ...

Internally, we used the upper() method of keys obtained from the dictionary. That's the main reason why we defined the type of the d argument as dict[str, Any], and the type of key argument as str.

However, the str type is not the only built-in type that has the upper() method. The other type that has the same method is bytes. If we would like to allow our get_ci() function to accept both string-keyed and bytes-keyed dictionaries, we need to specify the union of possible types.

Currently, the only way to specify type unions is through the typing.Union hint. This hint allows the union of bytes and str types to be specified as typing.Union[bytes, str]. The complete signature of the get_ci() function would be as follows:

def get_ci(
    d: dict[Union[str, bytes], Any], 
    key: Union[str, bytes]
) -> Any: 
    ...

That is already verbose, and for more complex functions, it can get only worse. This is why Python 3.10 will allow the union of types using the | operator to be specified. In the future, you will be able to simply write the following:

def get_ci(d: dict[str | bytes, Any], key: str | bytes) -> Any: ...

In contrast to type-hinting generics, the introduction of a type union operator does not deprecate the typing.Union hint. This means that we will be able to use those two conventions interchangeably.

Structural pattern matching

Structural pattern matching is definitely the most controversial new Python feature of the last decade, and it is definitely the most complex one.

The acceptance of that feature was preceded by numerous heated debates and countless design drafts. The complexity of the topic is clearly visible if we take a look over all the PEP documents that tried to tackle the problem. The following is a table of all PEP documents related to structural pattern matching (statuses accurate as of March 2021):

Date

PEP

Title

Type

Status

23-Jun-2020

622

Structural Pattern Matching

Standards Track

Superseded by PEP 634

12-Sep-2020

634

Structural Pattern Matching: Specification

Standards Track

Accepted

12-Sep-2020

635

Structural Pattern Matching: Motivation and Rationale

Informational

Final

12-Sep-2020

636

Structural Pattern Matching: Tutorial

Informational

Final

26-Sep-2020

642

Explicit Pattern Syntax for Structural Pattern Matching

Standards Track

Draft

9-Feb-2021

653

Precise Semantics for Pattern Matching

Standards Track

Draft

That's a lot of documents, and none of them are short. So, what is structural pattern matching and how can it be useful?

Structural pattern matching introduces a match statement and two new soft keywords: match and case. As the name suggests, it can be used to match a given value against a list of specified "cases" and act accordingly to the match.

A soft keyword is a keyword that is not reserved in every context. Both match and case can be used as ordinary variables or function names outside the match statement context.

For some programmers, the syntax of the match statement resembles the syntax of the switch statement found in languages such as C, C++, Pascal, Java, and Go. It can indeed be used to implement the same programming pattern, but is definitely much more powerful.

The general (and simplified) syntax for a match statement is as follows:

match expression:
    case pattern:
        ... 

expression can be any valid Python expression. pattern represents an actual matching pattern that is a new concept in Python. Inside a case block, you can have multiple statements. The complexity of a match statement stems mostly from the introduction of match patterns that may initially be hard to understand. Patterns can also be easily confused with expressions, but they don't evaluate like ordinary expressions do.

But before we dig into the details of match patterns, let's take a look at a simple example of a match statement that replicates the functionality of switch statements from different programming languages:

import sys
match sys.platform:
    case "windows":
        print("Running on Windows")
    case "darwin" :
        print("Running on macOS")
    case "linux": 
        print("Running on Linux")
    case _:
        raise NotImplementedError(
            f"{sys.platform} not supported!"
        )

This is, of course, a very straightforward example, but already shows some important elements. First, we can use literals as patterns. Second, there is a special _ (underscore) wildcard pattern. Wildcard patterns and other patterns that, from the syntax alone, can be proven to match always create an irrefutable case block. An irrefutable case block can be placed only as the last block of a match statement.

The previous example can, of course, be implemented with a simple chain of if, elif, and else statements. A common entry-level recruitment challenge is writing a FizzBuzz program.

A FizzBuzz program iterates from 0 to an arbitrary number and, depending on the value, does three things:

  • It prints Fizz if the value is divisible by 3
  • It prints Buzz if the value is divisible by 5
  • It prints FizzBuzz if the value is divisible by 3 and 5
  • It prints the value in all other cases

This is indeed a minor problem, but you would be surprised how people can stumble on even the simplest things when under the stress of an interview. This can, of course, be solved with a couple of if statements, but the use of a match statement can give our solution some natural elegance:

for i in range(100):
    match (i % 3, i % 5):
        case (0, 0): print("FizzBuzz")
        case (0, _): print("Fizz")
        case (_, 0): print("Buzz")
        case _: print(i)

In the preceding example, we are matching (i % 3, i % 5) in every iteration of the loop. We have to do both modulo divisions because the result of every iteration depends on both division results. A match expression will stop evaluating patterns once it finds a matching block and will execute only one block of code.

The notable difference from the previous example is that we used mostly sequence patterns instead of literal patterns:

  • The (0, 0) pattern: This will match a two-element sequence if both elements are equal to 0.
  • The (0, _) pattern: This will match a two-element sequence if the first element is equal to 0. The other element can be of any value and type.
  • The (_, 0) pattern: This will match a two-element sequence if the second element is equal to 0. The other element can be of any value and type.
  • The _ pattern: This is a wildcard pattern that will match all values.

Match expressions aren't limited to simple literals and sequences of literals. You can also match against specific classes and actually, with class patterns, things start to get really magical. That's definitely the most complex part of the whole feature.

At the time of writing, Python 3.10 hasn't yet been released, so it's hard to show a typical and practical use case for class matching patterns. So instead, we will take a look at an example from an official tutorial. The following is a modified example from the PEP 636 document that includes a simple where_is() function, which can match against the structure of the Point class instance provided:

class Point:
    x: int
    y: int
    def __init__(self, x, y):
        self.x = x
        self.y = y
def where_is(point):
    match point:
        case Point(x=0, y=0):
            print("Origin")
        case Point(x=0, y=y):
            print(f"Y={y}")
        case Point(x=x, y=0):
            print(f"X={x}")
        case Point():
            print("Somewhere else")
        case _:
            print("Not a point")

A lot is happening in the preceding example, so let's iterate over all the patterns included here:

  • Point(x=0, y=0): This matches if point is an instance of the Point class and its x and y attributes are equal to 0.
  • Point(x=0, y=y): This matches if point is an instance of the Point class and its x attribute is equal to 0. The y attribute is captured to the y variable, which can be used within the case block.
  • Point(x=x, y=0): This matches if point is an instance of the Point class and its y attribute is equal to 0. The x attribute is captured to the x variable, which can be used within the case block.
  • Point(): This matches if point is an instance of the Point class.
  • _: This always matches.

As you can see, pattern matching can look deep into object attributes. Despite the Point(x=0, y=0) pattern looking like a constructor call, Python does not call an object constructor when evaluating patterns. It also doesn't inspect arguments and keyword arguments of __init__() methods, so you can access any attribute value in your match pattern.

Match patterns can also use "positional attribute" syntax, but that requires a bit more work. You simply need to provide an additional __match_args__ class attribute that specifies the natural position order of class instance attributes, as in the following example:

class Point:
    x: int
    y: int
    __match_args__ = ["x", "y"]
    def __init__(self, x, y):
        self.x = x
        self.y = y
def where_is(point):
    match point:
        case Point(0, 0):
            print("Origin")
        case Point(0, y):
            print(f"Y={y}")
        case Point(x, 0):
            print(f"X={x}")
        case Point():
            print("Somewhere else")
        case _:
            print("Not a point") 

And that's just the tip of the iceberg. Match statements are actually way more complex than we could demonstrate in this short section. If we were to consider all the potential use cases, syntax variants, and corner cases, we could potentially talk about them throughout the whole chapter. If you want to learn more about them, you should definitely read though the three "canonical" PEPs: 634, 635, and 636.

Summary

In this chapter, we've covered the most important language syntax and standard library changes that have happened over the last four versions of Python. If you're not actively following Python release notes or haven't yet transitioned to Python 3.9, this should give you enough information to be up to date.

In this chapter, we've also introduced the concept of programming idioms. This is an idea that we will be referring to multiple times throughout the book. In the next chapter, we will take a closer look at many Python idioms by comparing selected features of Python to different programming languages. If you are a seasoned programmer who has just recently transitioned to Python, this will be a great opportunity to learn the "Python way of doing things." It will also be an opportunity to see where Python really shines, and where it might still be behind the competition.

    Reset